Skip to content

VariableOrderAccumulator #940

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jul 18, 2025
Merged

VariableOrderAccumulator #940

merged 11 commits into from
Jul 18, 2025

Conversation

mhauru
Copy link
Member

@mhauru mhauru commented May 29, 2025

Removes the order field of Metadata in favour of having an OrderedDict{VarName,Int} in the same accumulator as num_produce (renaming NumProduceAccumulator to VariableOrderAccumulator in the process). Also adds some == methods we were previously missing.

This is currently passing tests except anything related to JET. I think JET freaks out because the OrderedDict within the new accumulator has an abstract key type. I think it's fine to have the abstract key type as long as the value type is concrete, at least once we remove VariableOrderAccumulator from the set of default accumulators and only use it when doing ParticleGibbs. I'm thus tempted to not fix the JET issues and move this whole accumulator from DPPL to Turing.jl's part that interfaces with AdvancedPS. Not sure how to handle merging this PR in that case though.

Comment on lines +169 to +171
function Base.:(==)(vi1::VarInfo, vi2::VarInfo)
return (vi1.metadata == vi2.metadata && vi1.accs == vi2.accs)
end
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In making this PR I learned that the default implementation for structs is

function Base.:(==)(vi1::VarInfo, vi2::VarInfo)
    return (vi1.metadata === vi2.metadata && vi1.accs === vi2.accs)
end

i.e. all the fields are compared with === even when calling ==. That was causing trouble with some tests that did == checks of comparing SimpleVarInfos. So note that before this PR e.g. VarInfo() != VarInfo(), and now VarInfo() == VarInfo().

Copy link
Contributor

github-actions bot commented May 29, 2025

Benchmark Report for Commit bf5eb42

Computer Information

Julia Version 1.11.6
Commit 9615af0f269 (2025-07-09 12:58 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × AMD EPYC 7763 64-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)

Benchmark Results

|                 Model | Dimension |  AD Backend |      VarInfo Type | Linked | Eval Time / Ref Time | AD Time / Eval Time |
|-----------------------|-----------|-------------|-------------------|--------|----------------------|---------------------|
| Simple assume observe |         1 | forwarddiff |             typed |  false |                 13.5 |                 1.3 |
|           Smorgasbord |       201 | forwarddiff |             typed |  false |                763.8 |                37.7 |
|           Smorgasbord |       201 | forwarddiff | simple_namedtuple |   true |                490.7 |                49.1 |
|           Smorgasbord |       201 | forwarddiff |           untyped |   true |               1089.3 |                32.7 |
|           Smorgasbord |       201 | forwarddiff |       simple_dict |   true |               6638.3 |                23.5 |
|           Smorgasbord |       201 | reversediff |             typed |   true |               1142.8 |                37.1 |
|           Smorgasbord |       201 |    mooncake |             typed |   true |               1127.7 |                11.9 |
|    Loop univariate 1k |      1000 |    mooncake |             typed |   true |               7053.6 |                42.7 |
|       Multivariate 1k |      1000 |    mooncake |             typed |   true |               1038.1 |                 8.7 |
|   Loop univariate 10k |     10000 |    mooncake |             typed |   true |              84567.9 |                40.4 |
|      Multivariate 10k |     10000 |    mooncake |             typed |   true |               9169.2 |                 9.5 |
|               Dynamic |        10 |    mooncake |             typed |   true |                148.8 |                21.5 |
|              Submodel |         1 |    mooncake |             typed |   true |                 18.0 |                21.8 |
|                   LDA |        12 | reversediff |             typed |   true |               1190.2 |                 1.8 |

@@ -1808,13 +1800,12 @@ function BangBang.push!!(vi::VarInfo, vn::VarName, r, dist::Distribution)
[1:length(val)],
val,
[dist],
[get_num_produce(vi)],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a change in behaviour: Previously calling push!! automatically set the order for a variable. Now order is set only if the push!! takes place within tilde_assume!!. Options for this are

  1. say that it's the caller's responsibility to call set_order!! after push!!. This could be fine because only ParticleGibbs cares about order.
  2. add an extra hook for accumulators for push!!, that gets called on all accumulators on every push!! call, so that they can adjust their state accordingly.

If this is only relevant for VariableOrderAccumulator then I'd lean towards 1. If it comes up with other accumulators too then 2. might be warranted.

Similar considerations apply to at least push!, merge, and subset, which after this PR might result in out-of-sync VariableOrderAccumulators.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is only relevant for VariableOrderAccumulator then I'd lean towards 1.

I think that it's PG's responsibility to call setorder correctly, rather than DPPL, so I'd agree.

Similar considerations apply to at least push!, merge, and subset

Still think it should be handled in PG, not here. I assume that we could write functions like

function pg_push!!(...)
    vi = push!!(...)
    return setorder!!(...)
end

and make sure to always use that in the PG code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with that as long as it doesn't turn out that this is a common need for accumulators. One other instance comes to mind: Currently if you have a PointwiseLogDensityAccumulator in your varinfo and you subset or merge, the pointwise log densities don't get subsetted/merged, and you end up with an accumulator that tracks different variables from the varinfo. This is inconsequential because the use of PointwiseLogDensityAccumulator is so confined to calling the function that needs it.

I'm happy to make PG deal with this, but let's keep our eyes open in case this comes up with other accumulators.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PointwiseLogDensityAccumulator in your varinfo and you subset or merge

Ah, I see -- this would be true in the past as well with PointwiseLogDensityContext tracking different things from the subsetted varinfo, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I don't think PLDAccumulator by itself is a good enough argument for making these subset and merge functions, but it just made me wonder if this is a more common pattern with accumulators than we would at first assume. Easy to leave them out now and add them later if needed though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously calling push!! automatically set the order for a variable. Now order is set only if the push!! takes place within tilde_assume!!

Having looked at lots of this code in more detail recently, I don't think there is actually anywhere in the codebase that uses push!! outside of tilde_assume!!. (There are some tests, but we can trivially change the tests to match this new behaviour.) Do you know of any?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can easily believe that that's the only place.

@mhauru mhauru requested a review from penelopeysm May 29, 2025 16:00
@mhauru
Copy link
Member Author

mhauru commented May 29, 2025

Benchmark times indicate a horrendous loss of type stability. Will investigate, probably tomorrow.

@penelopeysm
Copy link
Member

Not sure how to handle merging this PR in that case though.

I had similar problems with other PRs. How about this?

  1. Make sure we're happy with the code, then drop it from the default accumulators and release a new minor version of DPPL. This will break upstream PG
  2. Fix PG to work with it, release new version of Turing
  3. Find code that can be moved from DPPL and move it to Turing

@mhauru
Copy link
Member Author

mhauru commented May 30, 2025

Is there a particular reason to first drop it from default accumulators and then move it to Turing.jl, rather than doing both in one go?

Also, regardless of what we do, I would develop the corresponding Turing.jl release in parallel, to avoid having to make a lot of patch DPPL releases when we realise we are missing something. I've started that work in TuringLang/Turing.jl#2550, but not yet for VariableOrderAccumulator.

@penelopeysm
Copy link
Member

penelopeysm commented May 30, 2025

Because it's annoyingly difficult to make Turing CI run with an unreleased version of DPPL, short of committing a test/Manifest.toml. There's the new [sources] thing that lets you point to unreleased versions, but it's 1.11 only, so the 1.10 tests will still need a Manifest. But I suppose if you're willing to run tests locally, that's fine (and maybe now that the tests are faster it's less unpalatable -- I've always hated running tests locally because of various reasons, the time being one of them, fiddling with imports and stuff being another).

(I don't think patch releases are really problematic, but there is always the possibility of having to make multiple minor releases to fix bugs, so I see the point)

@mhauru
Copy link
Member Author

mhauru commented May 30, 2025

The performance problem turned out to not be type stability, but rather that every call to unflatten (which happens with every call to logdensity) resulted in a call to deepcopy(::OrderedDict) in VariableOrderAccumulator. And those, it seems, are really slow. I've replaced OrderedDict with Dict (didn't really need the ordering anyway) and started to use copy rather than deecopy, let's see what that does to the benchmarks. (Seems like it makes them crash...)

Two thoughts:

  • We should probably go over the codebase and replace a lot of uses of deepcopy with copy, because deepcopy is bad practice.
  • VariableOrderAccumulator would be another use of VarNameTuple or some such data structure.

@penelopeysm
Copy link
Member

I'm just going through my list of supposed-to-review PRs and clearing them. Feel free to ping me again whenever you feel this is ready

Copy link
Contributor

github-actions bot commented Jul 9, 2025

DynamicPPL.jl documentation for PR #940 is available at:
https://TuringLang.github.io/DynamicPPL.jl/previews/PR940/

@penelopeysm penelopeysm mentioned this pull request Jul 10, 2025
20 tasks
Copy link

codecov bot commented Jul 15, 2025

Codecov Report

Attention: Patch coverage is 75.53191% with 23 lines in your changes missing coverage. Please review.

Project coverage is 82.07%. Comparing base (cba604b) to head (bf5eb42).
Report is 1 commits behind head on breaking.

Files with missing lines Patch % Lines
src/default_accumulators.jl 75.00% 10 Missing ⚠️
src/varinfo.jl 88.00% 3 Missing ⚠️
src/extract_priors.jl 50.00% 2 Missing ⚠️
src/pointwise_logdensities.jl 0.00% 2 Missing ⚠️
src/threadsafe.jl 0.00% 2 Missing ⚠️
src/values_as_in_model.jl 0.00% 2 Missing ⚠️
src/abstract_varinfo.jl 90.90% 1 Missing ⚠️
src/accumulators.jl 66.66% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           breaking     #940      +/-   ##
============================================
- Coverage     82.58%   82.07%   -0.51%     
============================================
  Files            38       38              
  Lines          4007     4023      +16     
============================================
- Hits           3309     3302       -7     
- Misses          698      721      +23     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mhauru mhauru marked this pull request as ready for review July 15, 2025 14:45
@mhauru mhauru requested a review from penelopeysm July 15, 2025 15:16
Copy link
Member

@penelopeysm penelopeysm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good although I think you may have to update the implementation of set_retained_vns_del! as it still uses vi.orders or metadata.{sym}.orders.

@@ -1808,13 +1800,12 @@ function BangBang.push!!(vi::VarInfo, vn::VarName, r, dist::Distribution)
[1:length(val)],
val,
[dist],
[get_num_produce(vi)],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously calling push!! automatically set the order for a variable. Now order is set only if the push!! takes place within tilde_assume!!

Having looked at lots of this code in more detail recently, I don't think there is actually anywhere in the codebase that uses push!! outside of tilde_assume!!. (There are some tests, but we can trivially change the tests to match this new behaviour.) Do you know of any?

@mhauru
Copy link
Member Author

mhauru commented Jul 18, 2025

Good spot, fixed set_retained_vns_del! and added a test that actually hits its main branch. The new implementation is simpler code-wise, probably a bit slower, but I don't think that matters.

@mhauru mhauru requested a review from penelopeysm July 18, 2025 13:40
@mhauru
Copy link
Member Author

mhauru commented Jul 18, 2025

For ease of comparison, here's the latest benchmark run:

|                 Model | Dimension |  AD Backend |      VarInfo Type | Linked | Eval Time / Ref Time | AD Time / Eval Time |
|-----------------------|-----------|-------------|-------------------|--------|----------------------|---------------------|
| Simple assume observe |         1 | forwarddiff |             typed |  false |                 13.5 |                 1.3 |
|           Smorgasbord |       201 | forwarddiff |             typed |  false |                763.8 |                37.7 |
|           Smorgasbord |       201 | forwarddiff | simple_namedtuple |   true |                490.7 |                49.1 |
|           Smorgasbord |       201 | forwarddiff |           untyped |   true |               1089.3 |                32.7 |
|           Smorgasbord |       201 | forwarddiff |       simple_dict |   true |               6638.3 |                23.5 |
|           Smorgasbord |       201 | reversediff |             typed |   true |               1142.8 |                37.1 |
|           Smorgasbord |       201 |    mooncake |             typed |   true |               1127.7 |                11.9 |
|    Loop univariate 1k |      1000 |    mooncake |             typed |   true |               7053.6 |                42.7 |
|       Multivariate 1k |      1000 |    mooncake |             typed |   true |               1038.1 |                 8.7 |
|   Loop univariate 10k |     10000 |    mooncake |             typed |   true |              84567.9 |                40.4 |
|      Multivariate 10k |     10000 |    mooncake |             typed |   true |               9169.2 |                 9.5 |
|               Dynamic |        10 |    mooncake |             typed |   true |                148.8 |                21.5 |
|              Submodel |         1 |    mooncake |             typed |   true |                 18.0 |                21.8 |
|                   LDA |        12 | reversediff |             typed |   true |               1190.2 |                 1.8 |

And here's the same thing from before these changes, on breaking:

|                 Model | Dimension |  AD Backend |      VarInfo Type | Linked | Eval Time / Ref Time | AD Time / Eval Time |
|-----------------------|-----------|-------------|-------------------|--------|----------------------|---------------------|
| Simple assume observe |         1 | forwarddiff |             typed |  false |                  8.5 |                 1.5 |
|           Smorgasbord |       201 | forwarddiff |             typed |  false |                621.6 |                39.1 |
|           Smorgasbord |       201 | forwarddiff | simple_namedtuple |   true |                399.6 |                50.6 |
|           Smorgasbord |       201 | forwarddiff |           untyped |   true |                997.4 |                33.1 |
|           Smorgasbord |       201 | forwarddiff |       simple_dict |   true |               5960.9 |                24.4 |
|           Smorgasbord |       201 | reversediff |             typed |   true |                999.1 |                39.0 |
|           Smorgasbord |       201 |    mooncake |             typed |   true |                970.0 |                 4.3 |
|    Loop univariate 1k |      1000 |    mooncake |             typed |   true |               5559.2 |                 3.9 |
|       Multivariate 1k |      1000 |    mooncake |             typed |   true |                922.3 |                 9.1 |
|   Loop univariate 10k |     10000 |    mooncake |             typed |   true |              62436.9 |                 3.5 |
|      Multivariate 10k |     10000 |    mooncake |             typed |   true |               8084.5 |                10.0 |
|               Dynamic |        10 |    mooncake |             typed |   true |                133.3 |                11.3 |
|              Submodel |         1 |    mooncake |             typed |   true |                 13.2 |                 6.0 |
|                   LDA |        12 | reversediff |             typed |   true |               1155.3 |                 4.4 |

There are substantial though not massive slowdowns across the board, and an especially significant hit for a model with a lot of varnames. I think this is fine once we make sure that VariableOrderAccumulator is only used when running a particle sampler, but might an be argument for not releasing a version of Turing.jl that uses accumulators until that is done.

Copy link
Member

@penelopeysm penelopeysm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat :)

@mhauru mhauru merged commit f4dd46a into breaking Jul 18, 2025
18 of 21 checks passed
@mhauru mhauru deleted the mhauru/order-accumulator branch July 18, 2025 15:33
penelopeysm added a commit that referenced this pull request Jul 20, 2025
This should have been changed in #940, but slipped through as the file
wasn't listed as one of the changed files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants